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Abstract — The recently proposed set-up of source coding with a 
side information "vending machine" allows the decoder to select 
actions in order to control the quality of the side information. The 
actions can depend on the message received from the encoder and 
on the previously measured samples of the side information, and 
are cost constrained. Moreover, the final estimate of the source by 
the decoder is a function of the encoder's message and depends 
causally on the side information sequence. Previous work by 
Permuter and Weissman has characterized the rate-distortion- 
cost function in the special case in which the source and the 
"vending machine" are memoryless. In this work, motivated by 
the related channel coding model introduced by Kramer, the rate- 
distortion-cost function characterization is extended to a model 
with in-block memory. Various special cases are studied including 
block-feedforward and side information repeat request models. 

Index Terms: Source coding, block memory, side information 
"vending machine", feedforward, directed mutual information. 

I. Introduction and System Model 

Consider the problem of source coding with controllable 
side information illustrated in Fig. Q] The encoder compresses 
a source X" = [X%, ...,X n ] to a message W of R bits per 
source symbol. The decoder, based on the message W, takes 
actions Ai for all i = X,..,,n, so as to control in a causal 
fashion the measured side information sequence Y n . The 
action Ai is allowed to be a function of previously measured 
values Y 1 ^ 1 of the side information, and the final estimate 
■ Xi is obtained by the decoder based on message W and as a 
causal function on the side information samples. The problem 
of characterizing the set of achievable tuples of rate R, average 
distortion D and average action cost V was solved in 1 1 Sec. 
II.E] under the assumptions of a memoryless source X n and of 
a memoryless probabilistic model for the side information Y n 
when conditioned on the source and the action sequences^. 
The distribution of the side information sequence given the 
source and action sequences is referred to as side information 
"vending machine" in (TJ. 

In this work, we generalize the characterization of the rate- 
distortion-cost performance for the set-up in Fig. Q] from 
the memoryless scenario treated in JT], to a model in which 
source and side information "vending machine" have in-block 
memory (iBM). With iBM, the probabilistic models for source 
and "vending machine" have memory limited to blocks of size 
L samples, where L does not grow with the coding length 

'The mentioned characterization in fTJ Sec. II.E] generalizes the result 
in (3] Sec. II] which is restricted to a model with action-independent side 
information. 
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Fig. 1 . Source coding with in-block memory (iBM) and causally controllable 
side information. 



n, as detailed below. The model under study is motivated by 
channel coding scenario put forth in [2| and can be considered 
to be the source coding counterpart of the latter. 

Notation: We write [a, b] = [a,a + 1, b] for integers b > 
a; [a,b] = a if a = b; and [a, b] is empty otherwise. For 
a sequence of scalars xi,...,x n , we write x n = [xi,...,x n ] 
and x° for the empty vector. The same notation is used for 
sequences of random variables X n = [Xi, X n ], or sets 
X n = [X\, x n \. 

A. System Model 

The system, illustrated in Fig. 1, is described by the follow- 
ing random variables. 

• A source X n with iBM of length L. The source X n 
consists of m blocks 
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with i £ [1, m], each of L symbols, so that n = mL. The 
alphabet is possibly changing across each L-block, that 
is, we have Xi £ X t u\ + x, for L alphabets Xi,...,Xl, 
where we have defined 



t(i) = r(i - 



(2) 



with r(x, y) being the remainder of x divided by y. 

• A message W € [1, 2 nR ] with R being the rate measured 
in bits per source symbol. 

• An action sequence A n with Ai G AtU)+\ for L 
alphabets Ai, Al- 

. A side information sequence Y n with Y l £ 3^(i)+i f° r 
L alphabets [Vi, J^- 

• A source estimate X n with Xi £ Xtn\+i for L alphabets 
X\, Xl. 
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Fig. 2. An action codetree v n (u>, ■) for a given message u; £ [1,2™ B ] 
Q>i = {0, 1}, n = 3). 



In order to simplify the notation, in the following, we will 
write Xi to denote X t ^ + i also for i > L, and similarly for the 
alphabets Ai, and A^. The variable are related as follows. 

• The source X n has iBM of length L in the sense that it 
is characterized as 



ft(i)+l( Z \i/L]] 



(3) 



for some functions fa : Z — >• Xi, with i e 
where Zi, with j S [l,m], is a memoryless process with 
probability distribution P(z). Note that (0 is equivalent 
to the condition that the distribution P(x n ) factorizes as 

The encoder maps the source X n into a message W € 
[l,2 nR ] according to some function h : X n — > [l,2 nR ] 
as W — h(X n ). To denote functional, rather than more 
general probabilistic, conditional dependence, we use the 
notation l(W\X n ). 

The decoder observes the message W and takes actions 
A n based also on the observation of the past samples 
of the side information sequence. Specifically, for each 
symbol iG [l,n] the action Ai is selected as 



A i = v i (W t Y i - 1 ), 



(4) 



for some functions Vl : [1, 2 nR ] x/- 1 -> Ai. This condi- 
tional functional dependence is denoted as l(ai|v l , y 1 ^ 1 ), 
where v ra = v n (w, •) represents the action codetree (or 
action strategy) for a given message w € [1, 2 nfl ] in the 
time interval i G [1, n], that is, the collection of functions 
Vi(w, •) in (|4|i for all i S A codetree v™(«;, •) is 

illustrated in Fig. |2] for 3^ = {0,1} and n = 3. Note 
that the subtrees v*(u>, •) with any i £ [l,n] can also be 
obtained from Fig. [2] 

The i/t/e information has iBM of length L in the sense 
that it is generated as a function of the previous actions 
taken in the same block and of the variable Z\i/L~\ (cf. 
(0) as follows 
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Fig. 3. A decoder codetree u n (w, ■) for a given message «j 6 [1, 2" 
CVi = {0,l}, n= 3). 



for some functions gi : A 1 x Z — > ^Vi, with j G 
Note that, as a special case, if the functions gi do 
not depend on the actions, equations (0 and (0 imply 
that the sequences X n and Y n are L-block memoryless 
in the sense that their joint distribution factorizes as 

• The decoder, based on the received message W along 
with the current and past samples of the side information 
sequence, produces the estimated sequence X n . Specif- 
ically, at each symbol i E [l,Ji], the estimate Xi is 
selected as 

X i =u i (W,Y i ) (6) 

for some functions it, : [1, 2 nR ] x y l ^ Xi. This condi- 
tional functional dependence is denoted as l(xj|u l , y" 1 ), 
where u"(w, •) represents the decoder codetree (or de- 
coder strategy) for a given message w 6 [l,2 nfl ] in 
the time interval i 6 [l,Ji], that is, the collection of 
functions Ui(w, •) in © for all i e A codetree 

u n (w, ■) (along with the subtrees u J (w, •) with i e [1, n]) 
is illustrated in Fig. [3] for 3^ = {0, 1} and n = 3. 
Overall, the probability distribution of the random variables 
(X n ,V n ,A n , U n ,Y n ,X n ) factorizes as 



i=l 



P(v n ,u n |a;™)l(a"||v n ,0y n " 



(7) 
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where we have used the directed conditioning notation in H. 
Accordingly, we have defined 

L 

P(y L \\a L \x L ) = Y[P( yi \a\x L ) (8) 

i=l 

and similarly for the deterministic conditional relationships 

n 

l(a"||v",0y"- 1 ) = ni(« l |v l ,2; 1 - 1 ) (9) 



Fig. 4. FDG for a source coding problem with iBM of length L = 2 and 
n = 4 source symbols (and hence m = 2 blocks). The two blocks are shaded 
and the functional dependence on the side information is drawn with dashed 
lines. 



and 



l(x n \\u n ,y n ) = l[l(x i \u i ,y i ). 



(10) 



Fig. 5. A codetree j n+1 (w, ■) for a given message w 6 [1,2™ ] (3^i 
{0, 1}, n = 3). 



II. Main Results 

In this section, the rate-distortion-cost function R(D, V) is 
derived and some of its properties are discussed. The next 
section illustrates various special cases and connections to 
previous works. 



A function dependence graph (FDG) (see, e.g., ||4]) illustrating 
the joint distribution Q for L = 2 and n = 2 (and thus m — 2) 
is shown in Fig. [4] 

Remark 1. In ©, functions l(a™| |v™, 0j/ n_1 ) and 
l(x n \\\i n : y n ) are fixed as they represent the map from 
the branches of the codetrees v™ and u" as indexed by the 
side information sequence to the action a,; and estimate as 
illustrated in Fig. [2] and Fig. [3] respectively. 

Fix a a non-negative and bounded function d L (x L , i L ) with 
domain X L x X L to be the distortion metric and a non-negative 
and bounded function r y L (a L ,x L ) with domain A L x X L 
to be the action cost metric. Under the selected metrics, a 
triple (R, D, T) is said to be achievable with distortion D and 
cost constraint T, if, for all sufficiently large m, there exist 
codetrees such that 



and 



i=l 



)]<r- 



(ii) 



(12) 



for any e > 0. The rate-distortion-cost function R(D, T) is 
the infimum of all achievable rates with distortion D and cost 
constraint T. 

Remark 2. The system model under study reduces to that 
investigated in [ 1 , Sec. II.E] for the special case with memo- 
ryless sources, i.e., with L = 1. 



A. Equivalent Formulation 

We start by showing that the problem can be formulated 
in terms of a single codetree. This contrasts with the more 
natural definitions given in the previous section, in which two 
separate codetrees, namely \r n (w, •) and u n (w, •), were used 
(see Fig. [2] and Fig. 0. Towards this end, we define a "joint" 



codetree (w, •) = (j 1 (w, ■), . 
the functional dependencies 



J ra+1 (w,-)) that satisfies 



and 



iKLi\y 



i(^lf +1 . 



y l ) = l(x l \u i ,y t ) 



(13) 



(14) 



for all i G [1, n]. The codetree j n+1 (w, •) is illustrated in Fig. 
[5] for n — 3. Note that the subtree •) only specifies the 
action a\ to be taken at time i = 1, while the the leaves of 
the tree j n+1 (w, ■) are indexed solely by the estimated value 
x n . 

With this definition, from (|7j, the probability distribution of 
the random variables (X n , J n+1 , A n , Y n , X n ) factorizes as 



p{r + "\x n )i{a n \\r m : 



(15) 



■m n \\r 2 + \y n ) 



Y[P(yt\\af\xf) 



where we recall that we 

u: =1 m\r + \y 1 )- 



have l(x n \\$ +1 ,y n ) 



B. Rate-Distortion-Cost Function 

Using the representation in terms of a single codetree 
given above, we now provide a characterization of the rate- 
distortion-cost function. 

Proposition 1. The rate-distortion-cost function is given by 

R(D,T) = y mmI(X L - 7 J L+1 ) (16) 

where the joint distribution of the variables X L ,Y L ,A L ,X L 
and of the codetree J i+1 factorizes as 

P(x L )P(j L+1 \x L )l(a L \\j L ,Oy L - 1 ) (17) 
■l(x L \\ti + \y L )P(y L \\a L \x L ), 

and the minimization is performed over the conditional distri- 
bution P(] L+1 \x L ) of the codetree under the constraints 

—E[d L (X L ,X L )] < D (18) 
L 

and 

^E[ 1 L (A L ,X L )]<T. (19) 

Proof: The achievability of Proposition Q] follows from 
classical random coding arguments. Specifically, the encoder 
draws the codetrees •) for all w € [1, 2"W £ ')+' 5 )] with 

some 5 > 0, as follows. First, for each w £ [1, 2 n ( R ( DS>+s ^] a 
concatenation of m codetrees jf •) of length L + l, with 
i 6 [1,™], is generated, such that the constituent codetrees 
jf +1 (u>, •) are i.i.d. and distributed with probability P(j L+1 ). 
The codetree j n+1 (w, •) is then obtained by combining the 
leaves and the root of successive constituent codetrees: the 
leaves of the past codetree specify the estimates for the 
previous time instant, while the root of the next codetree 
specify the action for the current time instant. The procedure 
is illustrated in Fig. [6] 

Encoding is performed by looking for a message 
w £ [1, 2 n ( R ( D } +s ^] such that the corresponding pair 
(x n , j™ +1 (uy)) is (strongly) jointly typical with respect to 
the joint distribution P(x L )P(j L+1 \x L ), when the sequences 
(x n , j™ +1 (uy)) are seen as the memoryless m-sequences 
(xi,ji +1 (w,-)),...Xx^ n ,i^ 1 ('w 7 -)). By the covering lemma 
||6] Lemma 3.3], rate 1/L ■ I(X L ;J L+1 ) suffices to guaran- 
tee the reliability of this step. Moreover, if the distribution 
P(j L+1 |a; L ) is selected so as to satisfy ( fT8l and (fT9l , then, by 
the typical average lemma [6|, the constraints (fTTT i and (Q~2j 
are also guaranteed to be met for sufficiently large n. The 
proof of the converse can be found in Appendix [A] ■ 

Remark 3. The rate-distortion-cost function can also be ex- 
pressed in terms of two separate codetrees using the definitions 
given in Sec. II-AI Specifically, following similar steps as in 
the proof of Proposition Q] the rate-distortion-cost function can 
be expressed as the minimization 

R(D,F) = imin/(X i ;V L ,U L ) (20) 



where the joint distribution of the variables X L ,Y L ,A L ,X L 
and of the codetrees V L and JJ L factorizes as 

P(x i )F(v i ,u L |x i )l(a i ||v L ,0 2 / i - 1 ) (21) 
■l(x L \\u L ,y L )P(y L \\a L \x L ), 

and the minimization is performed over the conditional dis- 
tribution P(v L , u L \x L ) of the codetrees under the constraints 
d and CE3- 

Remark 4. The rate-distortion-cost function in Proposition Q] 
does not include auxiliary random variables, since the codetree 
3 L+1 is part of the problem specification. This is unlike 
the characterization given in 0~] for the memoryless case. 
Moreover, problem ([TBI is convex in the unknown P(] L+1 \x L ) 
and hence can be solved using standard algorithms. It is also 
noted that, extending J5), one may devise a Blahut-Arimoto- 
type algorithm for the calculation of the rate-distortion-cost 
function. This aspect is not further investigated here. 

Based on the definition of J L+1 , we have the following 
cardinality bound on the number of codetrees to be considered 
in the optimization dT6b : 

L-l 

\J L+1 \ < I^iH^lI 1 ^ 1 n(I^HA+i|) |);i| . (22) 

i=l 

The following lemma shows that the this cardinality bound 
can be improved. 

Corollary 1. In the optimization ( 1761 ), the number of codetrees 
J L+1 can be limited as 

\J L+1 \ < \X L \ +3 (23) 

without loss of optimality. 

Proof: See Appendix B. ■ 

Remark 5. The achievable scheme used to prove Proposition 
Q] adapts the actions only to the side information samples 
corresponding to the same L-block. More precisely, the action 
Ai depends, through the selected codetree, only on the side 
information samples Y^t^, Since the problem defi- 
nition allows, via dU, for actions that depend on all past side 
information samples, namely Y l ~ x , this result demonstrates 
that adapting the actions across the blocks cannot improve the 
rate-distortion-cost function. This is consistent with the finding 
in [I], where it is shown that adaptive actions do not improve 
the rate-distortion performance for a memoryless model, i.e., 
with L = 1. Similarly, one can conclude from Proposition Q] 
that, while adapting the estimate Xi to the side information 
samples within the same L-block, namely Yi_ t u\, Yi, is 
generally advantageous, adaptation across the blocks is not. 
This extends the results in Q, in which it is shown that, for 
L = 1, the estimate can depend only on the current value of 
the side information without loss of optimality. 

III. Special Cases and Examples 

In this section, we detail some further consequences of 
Proposition Q] and connections with previous work. 
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Fig. 6. Illustration of the achievable scheme used in the proof of Proposition 
1 for binary alphabets = {0, 1} with m = 2 and L = 2. In the top figure, 
the codetrees j? («;, •) for i = 1, 2, which are generated i.i.d. with probability 
f are depicted. In the bottom figure, the resulting codetree j 5 («), •) 

is shown. It is noted that the action in the codetree j 5 (w, ■) in the bottom 
figure is obtained from the codetree j? (w, ■) with i = 2 on the top, and is 
thus independent of the value of y 2 . 



A. Memoryless Source (L = 1) 

As mentioned in Remark [2] if L = 1, the model at 
hand reduces to the standard one with memoryless sources, 
in which the joint distribution of X n and Y n factorizes as 
niLi P i x ii Vi)- This model was studied in [ 1 1, where the rate- 
distortion-cost function was derived. The result in JTJ Sec. II- 



E] can be seen to be a special case of Proposition QJ 

B. Action-Independent Side Information 

Here we consider the case in which the side information 
is action independent, that is, we have P(y L \\a L \x L ) = 
P(y L \x L ). Under this assumption, the action sequence does 
not need to be included in the model, and, from d20l i. the 
rate-distortion function is given by 



R(D) = -mmI(X L ;V L ), 

1j 



(24) 



where the joint distribution of the variables X L ,Y L ,X L and 
of the codetree U L factorizes as 



P(x^P(u L \x L )l(x^\\u\y^P(y^), 



(25) 



and the minimization is performed over the conditional distri- 
bution P(u L \x L ) of the codetrees under the constraint ([T8V 
Note that, given the absence of actions, we have used the 
formulation in terms of individual codetrees discussed in 
Remark [3] in order to simplify the notation. Using arguments 
similar to Corollary QJ one can show that the size of the 
codetree alphabet can be limited to \U L \ < \X L \ + 2 without 
loss of optimality. For L = 1, the characterization (l24l reduces 
to the one derived in J5] Sec. II]. 

C. Block-Feedforward Model 

As a specific instance of the setting with action-independent 
side information, we consider here the block-feedforward 
model in which we have Yi = Xi-i for all i not multiple of L 
and Yi equal to a fixed symbol in 3^ otherwise. This model is 
related to the feedforward set-up studied in |0, JS], H with 
the difference that here feedforward is limited to within the L- 
blocks. In other words, the side information is Yi = Xi-\ only 
if -X"j_i is in the same L-block as Yi and is not informative 
otherwise. We now show that, similar to [8 1, the rate-distortion 
function with block-feedforward can be expressed in terms of 
directed information and does not entail an optimization over 
the codetrees. 

Corollary 2. For the block-feedforward model, the rate- 
distortion function is given by 

R(D) = -minI(X L -> X L ) (26) 
L 

where the joint distribution of the variables X L , Y L and X L 
factorizes as 

P(x L )P(x L \x L )P(y L \x L ), (27) 

and the minimization is performed over the conditional distri- 
bution P(x L \x L ) under the constraint U8i . 

Remark 6. In the feedforward model studied in 0, JS], (9), 
feedforward of the source X n is not restricted to take place 
only within the i-blocks, namely we have Yi = X 1 ^ 1 for all 
i G [1, n]. As a result, the rate-distortion function is proved in 
©, to be given by the limit of d26jl over L. 

Proof: The achievability is obtained by using concate- 
nated codetrees of length L similar to Proposition [TJ However, 



unlike Proposition [TJ the codetrees are generated according to 
the distribution p(x L \\0x L ~ 1 ) as done in J8], The proof of 
achievability is completed as in ||8l. |91. As for the converse, 
starting from (l24l i. we write 

L 

I(X L ;U L ) = ^HXnU^X*- 1 ) 

i=i 

L 

= ^2l(X i ;TJ L ,^ i \X i - 1 ) 
i=i 

L 

> ^iix^x^x"- 1 ) 

i=l 

= I(X L -> X L ), (28) 

where the second equality follows since X 1 is a function of the 
codetree XJ L and of Y l — the inequality follows by the 

non-negativity of the mutual information; and the last equality 
is a consequence of the definition of directed information PI . 

■ 

Example 1. Consider a binary source with iBM of length 
L = 2 and block-feedforward such that variables Xi, for all 
odd i, are i.i.d. Bern(p), with < p < 0.5, while for all even 
i we have Xi = © Qi with Qi being i.i.d. Bern(q), 

with < q < 0.5 and independent of Xi for all odd i. 
Assuming Hamming distortion d 2 (x 2 ,x 2 ) — Y^=i l( x i>£i)< 
from Corollary we easily obtain that, if D < (p + q)/2, the 
rate-distortion function is given as 

min \ [H 2 {p) - H2(D X ) + H 2 {q) - H 2 (D 2 )} (29) 

where the minimization is under the constraints D\ < p and 
D 2 < q, and is zero otherwise. 

D. Side Information Repeat Request 

Consider the situation in which the decoder at any time i, 
upon the observation of the side information Y.- L , can decide 
whether to take a second measurement of the side information, 
thus paying the associated cost, or not. To elaborate, assume a 
memoryless source X n with distribution P{x). At any time i, 
the first observation Yn of the side information is distributed 
according to the memoryless channel P(yi\x) when the input 
is Xi = x, while the second observation Yi 2 depends on the 
action At = a via the memoryless channel P(y 2 \x,a) with 
input Xi = x. 

This scenario can be easily seen to be a special case of the 
model under study with iBM of size L = 2. The corresponding 
FDG is illustrated in Fig. [7] By comparing this FDG with the 
general FDG in Fig. [4] it is seen that the model under study 
in this section can be obtained from the one presented in Sec. 
II- Al by appropriately setting the alphabets of given subset of 
variables to empty sets and by relabeling. 

A characterization of the rate-distortion-cost function can 
be easily derived as a special case of Proposition [TJ Here 
we focus on a specific simple example. In particular, we 




Fig. 7. FDG for the model with side information repeat request. 

assume that the channel P(yi\x) for the first measurement 
is an erasure channel with erasure probability e. Moreover, 
the channel P(y 2 \x,a) for the second measurement is an 
independent and identical erasure channel if a = 1, while 
it produces Y 2 equal to the erasure symbol with probability 
1 if a = 0. In other words, the action a = 1 corresponds 
to performing a second measurement of the side information 
over an independent realization of the same erasure channel. 

It is apparent that, if Yi = X, one can set A = without 
loss of optimality. Instead, if Y\ equals the erasure symbol, 
then, in the absence of action cost constraints, it is clearly 
optimal to set A = 1. In so doing, the side information channel 
is converted into an equivalent erasure channel with erasure 
probability e 2 . Therefore, the rate-distortion is given by [10], 

CD 

R(D,r) = e 2 (^l-H 2 (^j^ (30) 

for D < e 2 /2 and zero otherwise, as long as the action 
cost budget T is large enough. More specifically, given the 
discussion above, it can be seen that T > e suffices to achieve 

CP. 

IV. Concluding Remarks 

Models with in-block memory (iBM), first proposed in the 
context of channel coding problems in [2| and here for source 
coding, provide tractable extensions of standard memoryless 
models. Specifically, in this paper, we have presented results 
for a point-to-point system with controllable side information 
at the receiver and iBM. Interesting generalizations include the 
investigation of multi-terminal models. 
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Appendix A 
Proof of the Converse of Propositions 

For any code achieving rate R with distortion D and cost 
r, we have the following series of inequalities: 

nR > H(W) = I(W;X n ) 

m m 

( =' J2 H ( x i)-E H ( x t\x (i - 1)L ,w) 



J2 H{X^) - J2 H(X^\X^ L , W, Y^ L ) 



Now, fix the so obtained distribution P(x L \j L+1 ) and recall 
that the other terms in (T32l are also fixed by the problem def- 
inition. Now, the quantities appearing in Proposition Q] can be 
written as convex combinations of functions of the terms fixed 
above, in which the distribution P(j L+1 ) defines the coeffi- 
cients of the combinations. Specifically, we have: (i) the distri- 
bution P{x L ) = J2jL+i P{i L+1 )P(x L \i L+1 ) for all x L e X L 
(but one), which fixes H(X L ); (ii) the conditional entropy 
H(X L \3 L+1 ) = £ J£+1 P(j L+1 )H(X L \J L+1 = and 
(Hi) the averages E[d L (X L ,X L )} and E[j L (A L , X L )\. It 
follows by the Caratheodory theorem that we can limit the 
alphabet of J L+1 as in ( 1231 without loss of optimality. 



(«/) 



> £ff(x/<)-£ff(x/1Jf +1 ) 



{ ^ mH{X L ) - mH(X L \3f +1 ,T) 
> mI(X L ;3 L+1 ), 

where (a) follows due to the block memory of the source X n ; 
(b) follows due to the Markov chain Xj - (X (l ~ 1)L , W) - 
y(i-i)L. ^ j s bi; amec i by defining Jf +1 as the subtree of 
J I+1 corresponding to y( i_1 ) L , respectively, and noting that 
is a function of (W, Y^~^ L ); (d) is due to the fact 
that conditioning cannot increase entropy; (e) is obtained by 
defining a random variable T uniformly distributed in the set 
[1, m] and independent of all other variables, and also the 
variables J i+1 = J^ +1 and X L = X%, and using the fact 
that the distribution of Xf does not depend on i. 

Given the definitions above, and setting A L = Atf, the joint 
distribution of the random variables at hand factorizes as 



P(x L )P(j L + 1 |^)l(a i ||J L ,Oy i - 1 ) 



(31) 



l(x L \\^ + \y L )P(y L \\a^), 



where we have defined P(j L+1 |a; i ) = 

7^127=1 p L+1 \ xL ^)- Note that ' in showing (fJB, it is 
critical that, as per (0, the side information Y^ in the ith 
block depends only on the actions in the ith block. The proof 
is concluded by noting that the defined random variables 
also satisfy the constraints ( TT~8T > and ( fl9] l due to the fact that 
any code at hand must satisfy the conditions (fTTT i and (flZb . 
respectively. 

Appendix B 
Proof of Corollary[T] 

Assume that a rate is achievable for some distribution 



L+U 



), where the cardinality of J L+1 is limited only by 



the count of available codetrees as in d22l . We want to show 
that the same rate can be achieved by limiting the alphabet of 
available codetrees as in (1231 . To this end, we first write the 
joint distribution (TTTb as 



P(j L+1 )P(x L \j L+i )l(a L \\j L ,Oy L -') 
■l{x L \\$ + \y L )P{y L \\a L \x L ). 



(32) 
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